8 research outputs found
Unsupervised and knowledge-poor approaches to sentiment analysis
Sentiment analysis focuses upon automatic classiffication of a document's sentiment (and more generally extraction of opinion from text). Ways of expressing sentiment have been
shown to be dependent on what a document is about (domain-dependency). This complicates supervised methods for sentiment analysis which rely on extensive use of training data or linguistic resources that are usually either domain-specific or generic. Both kinds of resources prevent classiffiers from performing well across a range of domains, as this requires appropriate in-domain (domain-specific) data.
This thesis presents a novel unsupervised, knowledge-poor approach to sentiment analysis aimed at creating a domain-independent and multilingual sentiment analysis system.
The approach extracts domain-specific resources from documents that are to be processed, and uses them for sentiment analysis. This approach does not require any training corpora, large sets of rules or generic sentiment lexicons, which makes it domain- and languageindependent but at the same time able to utilise domain- and language-specific information.
The thesis describes and tests the approach, which is applied to diffeerent data, including customer reviews of various types of products, reviews of films and books, and news items; and to four languages: Chinese, English, Russian and Japanese. The approach is applied not only to binary sentiment classiffication, but also to three-way sentiment classiffication (positive, negative and neutral), subjectivity classifiation of documents and sentences, and to the extraction of opinion holders and opinion targets. Experimental results suggest that the approach is often a viable alternative to supervised systems, especially when applied to large document collections
Basic Units for Chinese Opinionated Information Retrieval
This paper presents the results of experiments in which the authors tested different types of features for retrieval of Chinese opinionated texts. We assume that the task of retrieval of opinionated texts (OIR) can be regarded as a subtask of general IR, but with some distinct features. The experiments showed that the best results were obtained from combinating character-based processing, dictionary look up (maximum matching) and a negation check
Basic Units for Chinese Opinionated Information Retrieval
This paper presents the results of experiments in which the authors tested different types of features for retrieval of Chinese opinionated texts. We assume that the task of retrieval of opinionated texts (OIR) can be regarded as a subtask of general IR, but with some distinct features. The experiments showed that the best results were obtained from combinating character-based processing, dictionary look up (maximum matching) and a negation check
Automatic seed word selection for unsupervised sentiment classification of Chinese text
We describe and evaluate a new method of automatic seed word selection for unsupervised sentiment classification of product reviews in Chinese. The whole method is unsupervised and does not require any annotated training data; it only requires information about commonly occurring negations and adverbials. Unsupervised techniques are promising for this task since they avoid problems of domain-dependency typically associated with supervised methods. The results obtained are close to those of supervised classifiers and sometimes better, up to an F1 of 92%
Multilingual opinion holder and target extraction using knowledge-poor techniques
We describe an approach to multilingual sentiment analysis, in particular opinion holder and opinion target extraction, wich requires no annotated data and minimal language-specific input. The approach is based on un supervised, knowledge-poor techniques wich facilitate adaptation to new languages and domains. The system's result are comparable to those of supervised, languaje-specific systems previously applied to the NTCIR-7 MOAT evaluation data
Comparable Domain Dependency in Sentiment Analysis
Sentiment analysis (or opinion mining) is concerned not with the topic of a document, or its factual
content, but rather with the opinion expressed in a document. In this paper we present a number of
experiments on a word-based sentiment analysis on two corpora representing two related domains:
film reviews and book reviews. We find that even close domains are very difficult to process without
utilising in-domain data. We also indicate certain characteristics of features that affect cross-domain
performance of sentiment classifiers.ΠΠ½Π°Π»ΠΈΠ· ΠΎΡΠ΅Π½ΠΎΡΠ½ΠΎΠΉ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½ Π½Π΅ Π½Π° Π°Π½Π°Π»ΠΈΠ· ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΈΠ»ΠΈ ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ
ΠΊΠΎΠ½ΡΠ΅Π½ΡΠ°, Π° Π½Π° Π°Π½Π°Π»ΠΈΠ· ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠΈΡ
ΡΡ Π² ΡΠ΅ΠΊΡΡΠ΅ ΠΎΡΠ΅Π½ΠΎΠΊ ΠΈ ΡΡΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΊΠ°Π·ΡΠ²Π°Π½ΠΈΠΉ. Π
Π½Π°ΡΡΠΎΡΡΠ΅ΠΉ ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΈ ΠΌΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² ΠΏΠΎ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΌΡ
Π°Π½Π°Π»ΠΈΠ·Ρ ΠΎΡΠ΅Π½ΠΎΡΠ½ΠΎΠΉ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ ΠΏΡΠΈ ΠΏΠΎΠΌΠΎΡΠΈ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ½Π° Π½Π° ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π΅ Π΄Π²ΡΡ
ΠΊΠΎΡΠΏΡΡΠΎΠ² ΠΆΠ°Π½ΡΠΎΠ²ΠΎ
ΠΈ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ Π±Π»ΠΈΠ·ΠΊΠΈΡ
ΡΠ΅ΠΊΡΡΠΎΠ²: ΡΠ΅Π²ΡΡ ΡΠΈΠ»ΡΠΌΠΎΠ² ΠΈ ΡΠ΅Π²ΡΡ ΠΊΠ½ΠΈΠ³. ΠΡ ΠΎΠ±Π½Π°ΡΡΠΆΠΈΠ»ΠΈ, ΡΡΠΎ Π΄Π°ΠΆΠ΅
Π΄Π»Ρ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ Π±Π»ΠΈΠ·ΠΊΠΈΡ
ΡΠ΅ΠΊΡΡΠΎΠ² ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½Π°Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΎΡΠ΅Π½ΠΊΠΈ Π·Π°ΡΡΡΠ΄Π½ΠΈΡΠ΅Π»ΡΠ½Π° Π±Π΅Π·
ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ ΠΈΠ· ΠΎΠ±ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΠΎΠ³ΠΎ ΠΊΠΎΡΠΏΡΡΠ°. ΠΡ ΡΠ°ΠΊΠΆΠ΅ Π²ΡΡΠ²ΠΈΠ»ΠΈ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ½Π½ΡΠ΅
Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ½Π°, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΎΠΊΠ°Π·ΡΠ²Π°ΡΡ Π²Π»ΠΈΡΠ½ΠΈΠ΅ Π½Π° ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΎΡΠ΅Π½ΠΊΠΈ Π² ΡΠ΅ΠΊΡΡΠ΅
Comparable Domain Dependency in Sentiment Analysis
Sentiment analysis (or opinion mining) is concerned not with the topic of a document, or its factual
content, but rather with the opinion expressed in a document. In this paper we present a number of
experiments on a word-based sentiment analysis on two corpora representing two related domains:
film reviews and book reviews. We find that even close domains are very difficult to process without
utilising in-domain data. We also indicate certain characteristics of features that affect cross-domain
performance of sentiment classifiers.ΠΠ½Π°Π»ΠΈΠ· ΠΎΡΠ΅Π½ΠΎΡΠ½ΠΎΠΉ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½ Π½Π΅ Π½Π° Π°Π½Π°Π»ΠΈΠ· ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΈΠ»ΠΈ ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ
ΠΊΠΎΠ½ΡΠ΅Π½ΡΠ°, Π° Π½Π° Π°Π½Π°Π»ΠΈΠ· ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠΈΡ
ΡΡ Π² ΡΠ΅ΠΊΡΡΠ΅ ΠΎΡΠ΅Π½ΠΎΠΊ ΠΈ ΡΡΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΊΠ°Π·ΡΠ²Π°Π½ΠΈΠΉ. Π
Π½Π°ΡΡΠΎΡΡΠ΅ΠΉ ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΈ ΠΌΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² ΠΏΠΎ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΌΡ
Π°Π½Π°Π»ΠΈΠ·Ρ ΠΎΡΠ΅Π½ΠΎΡΠ½ΠΎΠΉ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ ΠΏΡΠΈ ΠΏΠΎΠΌΠΎΡΠΈ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ½Π° Π½Π° ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π΅ Π΄Π²ΡΡ
ΠΊΠΎΡΠΏΡΡΠΎΠ² ΠΆΠ°Π½ΡΠΎΠ²ΠΎ
ΠΈ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ Π±Π»ΠΈΠ·ΠΊΠΈΡ
ΡΠ΅ΠΊΡΡΠΎΠ²: ΡΠ΅Π²ΡΡ ΡΠΈΠ»ΡΠΌΠΎΠ² ΠΈ ΡΠ΅Π²ΡΡ ΠΊΠ½ΠΈΠ³. ΠΡ ΠΎΠ±Π½Π°ΡΡΠΆΠΈΠ»ΠΈ, ΡΡΠΎ Π΄Π°ΠΆΠ΅
Π΄Π»Ρ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ Π±Π»ΠΈΠ·ΠΊΠΈΡ
ΡΠ΅ΠΊΡΡΠΎΠ² ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½Π°Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΎΡΠ΅Π½ΠΊΠΈ Π·Π°ΡΡΡΠ΄Π½ΠΈΡΠ΅Π»ΡΠ½Π° Π±Π΅Π·
ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ ΠΈΠ· ΠΎΠ±ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΠΎΠ³ΠΎ ΠΊΠΎΡΠΏΡΡΠ°. ΠΡ ΡΠ°ΠΊΠΆΠ΅ Π²ΡΡΠ²ΠΈΠ»ΠΈ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ½Π½ΡΠ΅
Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠΈ Π»Π΅ΠΊΡΠΈΠΊΠΎΠ½Π°, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΎΠΊΠ°Π·ΡΠ²Π°ΡΡ Π²Π»ΠΈΡΠ½ΠΈΠ΅ Π½Π° ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΎΡΠ΅Π½ΠΊΠΈ Π² ΡΠ΅ΠΊΡΡΠ΅